2 November 2016

Urban Observatory Air Quality Data

Urban Observatory Readings

Bayesian Parametric Inference

  • Given data, \(y\), we want to fit a model with parameters \(\theta\)

  • Bayes theorem allows us to do this

\[p(\theta | y) = \frac{p(\theta)p(y | \theta)}{p(y)} \]

  • \(p(\theta | y)\) is the posterior distribution, \(p(\theta)\) is the prior distribution, \(p(y|\theta\)) the likelihood and \(p(y) = \int_\theta p(y | \theta) p(\theta) \textrm{d}\theta\)

  • Once parameters have been determined, the parametric model can be used to answer questions about the process which generated the data, \(y\)

Streaming Data

  • Streaming data is observed as a chronological sequence of values, \(Y\), with an associated timestamp, \(t_i\)

\[ Y(t_{0:N}) = \{y(t_0), y(t_1), \dots , y(t_N) \} \]

  • Typically streaming data is unbounded:

\[y(t_0), y(t_1), \dots \]

  • We want to forecast future observations, which have not yet been observed

  • We do this by fitting a parametric model which describes the process

Irregularly Observed Data

  • Streaming data is often irregularly observed, consider first a completely observed discrete sensor for a lightbulb

  • The lightbulb can be considered either on, or off, here is a regularly observed series of the lightbulb

  • The lightbulb reports its status every five minutes, leading to \(\frac{60}{5} \times 24 = 288\) observations each day

Discrete Irregularly Observed Data

  • Recording the lightbulb data irregularly can increase accuracy, and reduce storage costs

  • In the regularly observed data there are 288 observations a day, in the irregular case there is one observation

  • With irregularly observed data, we can accurately answer questions like: "at 12pm on Tuesday, how many lights were on in The Core?"

Modelling Irregularly Observed Data

POMP Models

  • Partially Observed Markov Process model

\[\begin{align*} y(t_i)|\eta(t_i) &\sim \pi(y(t_i) | \eta(t_i)), \\ \eta(t_i)|\textbf{x}(t_i) &= g(F_{t_i}^T \textbf{x}(t_i)), \\ \textbf{X}(t_i) | \textbf{x}(t_{i-1}) &\sim p(\textbf{x}(t_i) | \textbf{x}(t_{i-1})) \end{align*}\]

  • The observation distribution \(\pi(.)\) is flexible
  • The function \(g\) permits a deterministic non-linear transformation of the state space
  • \(F_t\) is a time dependent vector representing a linear transformation
  • The state space, \(\textbf{x}(t)\) is a continuous time Markov process

POMP Model

Representation of a POMP model as a directed acyclic graph (DAG)

The State Space

Diffusion Process Plot, Generalised Brownian Motion

The State Space

Diffusion Process Plot, Ornstein-Uhlenbeck Process

Example POMP Model

A Poisson Model

\[\begin{align*} N(t) &\sim \textrm{Poisson}(N(t) | \lambda(t)) \\ \lambda(t_i) &= \exp\{ x(t) \} \\ \textrm{d}X(t) &= \mu X(t)\textrm{d}t + \sigma \textrm{d}W(t) \end{align*}\]

Composing POMP models

Modelling Complex Processes

  • Define an associative binary composition operator for models
  • The Left-hand observation model and linking function, \(g\) are chosen
  • The Linear transformation vectors are concatenated: \(F^{(3)}_t = \begin{bmatrix} F^{(1)}_t \\ F^{(2)}_t \end{bmatrix}\)
  • The latent state of each model advances according to its state evolution equation:

\[ \textbf{X}(t_i) | \textbf{x}(t_{i-1}) \sim \begin{pmatrix} p_1(\textbf{x}^{(1)}(t_i) | \textbf{x}^{(1)}(t_{i-1})) \\ p_2(\textbf{x}^{(2)}(t_i) | \textbf{x}^{(2)}(t_{i-1})) \end{pmatrix} \]

  • The class of unparameterised models and the composition operator forms a semi group

Composed POMP Example

A Seasonal-Poisson Model

  • Define a seasonal model, with an Ornstein-Uhlenbeck state space

\[\begin{align*} y(t) &\sim \mathcal{N}(y(t) | \mu(t), \sigma) \\ \mu(t) &= F_t^T \textbf{x}(t) \\ \textrm{d}\textbf{X}(t) &= \alpha(\theta - \textbf{X}(t)) \textrm{d}t + \Sigma \textrm{d}W(t) \end{align*}\]

  • \(F_t^T\) is a time dependent vector of fourier components representing seasonality:

\[F_t = \begin{pmatrix} \cos(\omega t) \\ \sin(\omega t) \\ \cos(2\omega t) \\ \sin(2\omega t) \\ \cos(3\omega t) \\ \sin(3\omega t) \end{pmatrix}\]

  • The Poisson model is as previously

Composed Pomp Example

A Seasonal-Poisson Model

\[\begin{align*} N(t) &\sim \textrm{Poisson}(N(t) | \lambda(t)) \\ \lambda(t_i) &= \exp\{ F_t^T x(t) \} \\ \textbf{X}(t_i)|\textbf{x}(t_{i-1}) &\sim \begin{pmatrix} p_1(\textbf{x}^{(1)}(t_i) | \textbf{x}^{(1)}(t_{i-1})) \\ p_2(\textbf{x}^{(2)}(t_i) | \textbf{x}^{(2)}(t_{i-1})) \end{pmatrix} \end{align*}\]

Programming with Streams

Stream

An infinite list

  • Streams can be thought of as an infinite list
  • A Stream is pair, with the left hand value being the first element of the stream and the right hand value a thunk (a function with no arguments)
  • Define a (infinite) stream of natural numbers using from
scala> val naturalNumbers = Stream.from(1)

Stream(1, ?)

Akka Streams

  • Akka Streams: A Scala library for stream processing

  • Akka streams have three main abstractions:

  • A Source is a definition of a stream, it can be a "pure" stream, or a database or webservice call

  • A Flow is a processing stage, which can be used to transform a Source to another Source

  • A Sink defines what happens at the end of the stream, usually an effect such as writing to a file

Higher Order Functions

Operations on Streams

  • foldLeft (or foldRight) can be used (with a seed) to reduce a foldable data structure by recursively applying a binary operation
scala> Stream.from(1).
take(10).
foldLeft(0)(_ + _)

55

  • foldLeft will reduce from the left, ((1 + 2) + 3) + 4

  • foldRight will reduce from the right, 1 + (2 + (3 + 4))

  • reduce is equivalent to foldLeft for associative and commutative binary operations and can be applied in parallel

Higher Order Functions

Operations on Streams

  • scanLeft can be used to accumulate a running sum:
scala> Stream.from(1).
take(10).
scanLeft(0)(_ + _).
foreach(n => print(s"$n, "))

0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55

POMP models as a Stream

  • The latent state is a Markov process, which means future observations only depend on the current observation

\[p(x(t_n) | x(t_{n-1}), \dots x(t_0)) = p(x(t_n) | x(t_{n-1}))\]

  • In order to simulate the Markov Process, we can use the dual of fold, unfold, to simulate a random walk:

\[x(t) = x(t-1) + w(t), \quad w(t) \sim \mathcal{N}(0, W)\]

val x0 = Gaussian(0.0, 1.0).draw

Source.unfold(x0)(a => Some((Gaussian(a, sigma).draw, a)))

Bayesian Inference with Streams

The Bootstrap Particle Filter

  • In general, POMP models require simulation based filtering (such as the bootstrap filter (Gordon, Salmond, and Smith 1993)) to determine the behaviour of the latent state. *Let's describe a bootstrap filter for this Gaussian model:

\[\begin{align*} y(t_i) &\sim \mathcal{N}(y(t_i) | x(t_i), V) \\ dX(t_i) &= \mu X(t) \textrm{d}t + \sigma \textrm{d}W(t) \end{align*}\]

  • Initialize: Sample \(N\) particles from the state prior distribution, \(x^{(i)}(t_0) \sim p(x(t_0))\), set the weights to \(1/N\), \(w^{(i)}(t_0) = 1/N\):
val n = 1000 // number of particles
val (x, w) = (Gaussian(0.0, 1.0).sample(n), Seq.fill(n)(1/n))

Bayesian Inference with Streams

The Bootstrap Particle Filter

  • Advance the particles to the time of the next observation, using the state transition model \(x^{(i)}(t_k) \sim p(x^{(i)}(t_k) | x^{(i)}(t_{k-1}))\)
def stepState(mu: Double, sigma: Double)
  (x: State, dt: TimeIncrement): State = {
    Gaussian(x + mu * dt, sigma * sigma * dt).draw
}
val x1 = x map (stepState(_, dt))
  • Calculate the likelihood (weight) of each particle, given the observation at time \(t_k\), \(w^{(i)}(t_k) = p(y(t_k) | x(t_k))\)
val weights = x1 map (Gaussian(_, v).pdf(observation))

Bayesian Inference with Streams

The Bootstrap Particle Filter

  • Resample the particles according to the weights, ie. select particle \(j\) with probability \(w^{(j)}(t_k)\)
def resample(w: Seq[LogLikelihood], x: Seq[State]) = {
  Multinomial(w).sample(x.size) map (i => x(i))
}
  • Store the approximate sample from the filtering distribution, \(\hat{p}(x(t_k) | y(t_k))\)

  • Advance to the time of the next observation, and repeat without the intialization step

Bayesian Inference with Streams

The Bootstrap Particle Filter

  • Here's the full implementation of a single step of the bootstrap particle filter for the simple Gaussian model
case class Data(time: Datetime, observation: Double)
case class FilterState(t0: Datetime, state: Seq[State])

def stepFilter(v: Double)(s: FilterState, d: Data): FilterState = {
  val dt = d.time - s.t0
  val state = s.state map (x => stepState(x, dt))
  val w = state map (x => Gaussian(x, v).logPdf(d.observation))

  FilterState(d.time, resample(w, state))
}

Bayesian Inference With Streams

The Bootstrap Particle Filter

  • In order to perform the bootstrap particle filter, we need to know the time of the previous observation and the previous particle cloud

  • The the filter can by ran using an akka Flow and the scan operation, the initial state init contains the particle cloud and the time at the start of the application of the filter

def filter(v: Double, init: FilterState) = 
  Flow[Data].scan(init)(stepFilter(v))

dataStream.
  via(filter(v, init)).
  runWith(Sink.foreach(println))
  • The particle filter will be applied to the data stream one element at a time, a Source and print the result to the console as each datapoint is processed

Bayesian Inference With Streams

Parallel Particle Filters

  • In order to account for the uncertainty of the parameter estimates, we can sample from the parameter posterior \(p(\theta | y(t_{1:N}))\) and run multiple particle filters

  • Running multiple particle filters (in parallel) is trivial, assume variance is the posterior distribution of the variance:

Source(variance.sample(4).toList).
  mapAsync(2){ v =>
    Source(sims).
      take(100).
      via(filter(v, init)).
      runWith(Sink.ignore)
  }
  • The full implementation of this can be found in the associated github repo git.io/scalable

Illustration of Bootstrap Particle Filter

  • The solid line represents the simulated state, the points represent the particle cloud with the opacity representing each particles likelihood

Parameter Inference for POMP models

  • Offline Parameter Inference:
    • MCMC algorithms: Particle Marginal Metropolis Hastings (Andrieu and Roberts 2009)
  • Online Parameter Inference:
    • Liu and West (Liu and West 2001)
    • Storvik Filter (Storvik 2002)
    • Particle Learning (Carvalho et al. 2010)

Conclusion

  • Bayesian Inference for irregularly observed time series
    • Improves accuracy of inference
    • Reduces computational cost
    • Regularly observed data is simply a special case

Further Reading

References

Andrieu, Christophe, and Gareth O. Roberts. 2009. “The pseudo-marginal approach for efficient Monte Carlo computations.” Annals of Statistics 37 (2): 697–725. doi:10.1214/07-AOS574.

Carvalho, Carlos M., Michael S. Johannes, Hedibert F. Lopes, and Nicholas G. Polson. 2010. “Particle Learning and Smoothing.” Statistical Science 25 (1): 88–106. doi:10.1214/10-STS325.

Gordon, N.J., D.J. Salmond, and a.F.M. Smith. 1993. “Novel approach to nonlinear/non-Gaussian Bayesian state estimation.” IEE Proceedings F Radar and Signal Processing 140 (2): 107. doi:10.1049/ip-f-2.1993.0015.

Liu, Jane, and Mike West. 2001. “Combined Parameter and State Estimation in Simulation-Based Filtering.” Springer, 197–223.

Storvik, Geir. 2002. “Particle filters for state-space models with the presence of unknown static parameters.” IEEE Transactions on Signal Processing 50 (2): 281–89. doi:10.1109/78.978383.